Subgroup Identification Analysis

GBSG Example

Author

Larry Leon

Code
# Set options
knitr::opts_chunk$set(
  echo = TRUE,
  warning = FALSE,
  message = FALSE,
  fig.align = 'center',
  fig.retina = 2
)
rm(list=ls())
library(tinytex)
Warning: package 'tinytex' was built under R version 4.5.2
Code
library(ggplot2)

#library(table1)

library(gt)

library(survival)
library(data.table)
library(randomForest)
library(grf)
library(policytree)
library(DiagrammeR)

#library(grid)
#library(forestploter)

#library(randomizr)
# library(devtools)
# install_github("larry-leon/weightedsurv", force = TRUE)
#install.packages("weightedsurv")
# install_github("larry-leon/forestsearch", force = TRUE)


library(forestsearch)
library(weightedsurv)

#help(forestsearch)
#help(generate_aft_dgm_flex)

# Set theme for plots
theme_set(theme_minimal(base_size = 12))

1 Summary

Reproducing main GBSG analysis

1.1 Datasetup

Code
df.analysis <- gbsg
df.analysis <- within(df.analysis,{
id <- as.numeric(c(1:nrow(df.analysis)))  
# time to months
time_months <- rfstime/30.4375
grade3 <- ifelse(grade=="3",1,0)
treat <- hormon
})
confounders.name <- c("age","meno","size","grade3","nodes","pgr","er")
outcome.name <- c("time_months")
event.name <- c("status")
id.name <- c("id")
treat.name <- c("hormon")

1.2 Kaplan-Meier curves and baseline summary

Code
dfcount <- df_counting(
  df = df.analysis,
  by.risk = 6,
  tte.name = outcome.name, 
  event.name = event.name, 
  treat.name = treat.name
)
plot_weighted_km(dfcount, conf.int = TRUE, show.logrank = TRUE, ymax = 1.05, xmed.fraction = 0.775, ymed.offset = 0.125)

Code
create_summary_table(data = df.analysis, treat_var = treat.name, 
                     table_title = "GBSG Characteristics by Treatment Arm",
                                      vars_continuous=c("age","nodes","size","er","pgr"),
                                      vars_categorical=c("grade","grade3"),
                                      font_size = 12)
GBSG Characteristics by Treatment Arm
Characteristic Control (n=440) Treatment (n=246) P-value1 SMD2
age Mean (SD) 51.1 (10.0) 56.6 (9.4) <0.001 0.57
nodes Mean (SD) 4.9 (5.6) 5.1 (5.3) 0.665 0.03
size Mean (SD) 29.6 (14.4) 28.8 (14.1) 0.470 0.06
er Mean (SD) 79.7 (124.2) 125.8 (191.1) <0.001 0.30
pgr Mean (SD) 102.0 (170.0) 124.3 (249.7) 0.213 0.11
grade 0.273 0.06
1 48 (10.9%) 33 (13.4%)
2 281 (63.9%) 163 (66.3%)
3 111 (25.2%) 50 (20.3%)
grade3 0.174 0.05
0 329 (74.8%) 196 (79.7%)
1 111 (25.2%) 50 (20.3%)
1 P-values: t-test for continuous, chi-square/Fisher's exact for categorical/binary variables
2 SMD = Standardized mean difference (Cohen's d for continuous, Cramer's V for categorical)

1.3 GRF analysis

Code
## GRF
grf_est1 <- grf.subg.harm.survival(data=df.analysis,
confounders.name = confounders.name,
outcome.name=outcome.name, event.name=event.name, id.name=id.name, treat.name=treat.name,
maxdepth = 2, n.min = 60, dmin.grf = 12, frac.tau=0.6, details=TRUE)
tau, maxdepth = 46.75811 2 
   leaf.node control.mean control.size control.se depth
1          2         6.49        82.00       3.34     1
2          3        -4.10       604.00       1.06     1
11         4        -7.90       112.00       2.81     2
21         5         3.86       177.00       1.87     2
4          7        -5.89       356.00       1.33     2

Selected subgroup:
  leaf.node control.mean control.size control.se depth
1         2         6.49        82.00       3.34     1

GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "er <= 0"

All splits:
[1] "er <= 0"   "age <= 50" "age <= 43"
Code
# NOTE: In general for GRF trees
# leaf1 --> recommend control
# leaf2 --> recommend treatment
# Tree depth 1
plot(grf_est1$tree1,leaf.labels=c("Control","Treat"))
Code
# Tree depth 2
plot(grf_est1$tree2,leaf.labels=c("Control","Treat"))

1.4 Forestsearch with depth=2 (maxk = 2)

Code
# Setup parallel processing
library(doFuture)
library(doRNG)

registerDoFuture()
registerDoRNG()

system.time({fs <- forestsearch(df.analysis,  confounders.name = confounders.name,
                                outcome.name = "time_months", treat.name = "hormon", event.name = "status", id.name = "id",
                                potentialOutcome.name = NULL, 
                                df.test = NULL,
                                flag_harm.name = NULL,
                                hr.threshold = 1.25, hr.consistency = 1.0, pconsistency.threshold = 0.90,
                                sg_focus = "maxSG", max_subgroups_search = 30,
                                showten_subgroups = TRUE, details=TRUE,
                                conf_force = c("age <= 65"),
                                cut_type = "default", use_grf = TRUE, plot.grf = TRUE, use_lasso = TRUE,
                                maxk = 2, 
                                n.min = 60, d0.min = 12, d1.min = 12,
                                plot.sg = TRUE, by.risk = 6,
                                parallel_args = list(plan="callr", workers = 30, show_message = TRUE)
)
})
GRF stage for cut selection with dmin,tau= 12 0.6 
tau, maxdepth = 46.75811 2 
   leaf.node control.mean control.size control.se depth
1          2         6.49        82.00       3.34     1
2          3        -4.10       604.00       1.06     1
11         4        -7.90       112.00       2.81     2
21         5         3.86       177.00       1.87     2
4          7        -5.89       356.00       1.33     2

Selected subgroup:
  leaf.node control.mean control.size control.se depth
1         2         6.49        82.00       3.34     1

GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "er <= 0"

All splits:
[1] "er <= 0"   "age <= 50" "age <= 43"
# of continuous/categorical characteristics 5 2 
Continuous characteristics: age size nodes pgr er 
Categorical characteristics: meno grade3 
## Prior to lasso: age size nodes pgr er 
#### Lasso selection results 
7 x 1 sparse Matrix of class "dgCMatrix"
                 s0
age     .          
meno    .          
size    0.005433435
grade3  0.178139021
nodes   0.049670523
pgr    -0.001812895
er      .          
Cox-LASSO selected: size grade3 nodes pgr 
Cox-LASSO not selected: age meno er 
### End Lasso selection 
## After lasso: size nodes pgr 
Default cuts included from Lasso: size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size) nodes <= mean(nodes) nodes <= median(nodes) nodes <= qlow(nodes) nodes <= qhigh(nodes) pgr <= mean(pgr) pgr <= median(pgr) pgr <= qlow(pgr) pgr <= qhigh(pgr) 
Categorical after Lasso: grade3 
Factors per GRF: er <= 0 age <= 50 age <= 43 
Initial GRF cuts included er <= 0 age <= 50 age <= 43 
Factors included per GRF (not in lasso) er <= 0 age <= 50 age <= 43 

===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 17 cut expressions once and caching...
Cut evaluation summary:
  Total cuts:  17 
  Valid cuts:  17 
  Errors:  0 
✓ All 17 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====

# of candidate subgroup factors= 17 
 [1] "er <= 0"      "age <= 50"    "age <= 43"    "age <= 65"    "size <= 29.3"
 [6] "size <= 25"   "size <= 20"   "size <= 35"   "nodes <= 5"   "nodes <= 3"  
[11] "nodes <= 1"   "nodes <= 7"   "pgr <= 110"   "pgr <= 32.5"  "pgr <= 7"    
[16] "pgr <= 131.8" "grade3"      
Number of factors evaluated= 17 
Confounders per grf screening q1 q2 q3 q17 q5 q6 q15 q11 q10 q13 q8 q9 q7 q14 q12 q16 q4 
        Factors Labels VI(grf)
1       er <= 0     q1  0.1345
2     age <= 50     q2  0.1290
3     age <= 43     q3  0.1213
17       grade3    q17  0.0663
5  size <= 29.3     q5  0.0631
6    size <= 25     q6  0.0559
15     pgr <= 7    q15  0.0521
11   nodes <= 1    q11  0.0497
10   nodes <= 3    q10  0.0455
13   pgr <= 110    q13  0.0451
8    size <= 35     q8  0.0436
9    nodes <= 5     q9  0.0414
7    size <= 20     q7  0.0406
14  pgr <= 32.5    q14  0.0406
12   nodes <= 7    q12  0.0383
16 pgr <= 131.8    q16  0.0223
4     age <= 65     q4  0.0108
Number of possible configurations (<= maxk): maxk = 2 , # combinations = 595 
Events criteria: control >= 12 , treatment >= 12 
Subgroup search completed in 0.01 minutes
Found 13 subgroup candidate(s)
# of candidate subgroups (meeting all criteria) = 13 
# of unique initial candidates: 13 
# Restricting to top stop_Kgroups = 30 
# of candidates restricted to 'top K': 13 
Using parallel processing: callr with 30 workers
*** Not met: Subgroup, % Consistency = {age <= 50} {pgr <= 110} 0.68 
*** Not met: Subgroup, % Consistency = {age <= 50} !{age <= 43} 0.82 
*** Not met: Subgroup, % Consistency = {age <= 50} {pgr <= 32.5} 0.73 
Consistency met!
# of splits = 1000 
**** Subgroup, % Consistency Met= {er <= 0} 0.96 
Consistency met!
# of splits = 1000 
**** Subgroup, % Consistency Met= {er <= 0} {age <= 65} 0.94 
Consistency met!
# of splits = 1000 
**** Subgroup, % Consistency Met= {er <= 0} {pgr <= 32.5} 0.98 
*** Not met: Subgroup, % Consistency = {grade3} {pgr <= 7} 0.87 
*** Not met: Subgroup, % Consistency = {age <= 50} {pgr <= 7} 0.81 
*** Not met: Subgroup, % Consistency = !{size <= 35} {nodes <= 5} 0.54 
Consistency met!
# of splits = 1000 
**** Subgroup, % Consistency Met= {er <= 0} !{age <= 43} 0.96 
Consistency met!
# of splits = 1000 
**** Subgroup, % Consistency Met= {er <= 0} {pgr <= 7} 0.93 
Consistency met!
# of splits = 1000 
**** Subgroup, % Consistency Met= {er <= 0} {size <= 35} 0.99 
Consistency met!
# of splits = 1000 
**** Subgroup, % Consistency Met= {er <= 0} !{size <= 20} 0.93 

*** Subgroup found: {er <= 0} 
% consistency criteria met= 0.96 
SG focus= maxSG 
Subgroup Consistency Minutes= 0.05548333 
Subgroup found (FS) with sg_focus='maxSG'
Selected subgroup: {er <= 0} 
Minutes forestsearch overall = 0.07 
   user  system elapsed 
 37.458   1.916   4.471 
Code
plan("sequential")


# Results for estimation (training) data, which_df = "est" is default
res_tabs <- sg_tables(fs, ndecimals = 3, which_df = "est")

res_tabs$sg10_out
Identified Subgroups
Two-factor subgroups (maxk=2)
Factor 1 Factor 2 N Events E1 HR Pcons
{er <= 0} 82 45 16 1.951 0.960
{er <= 0} {age <= 65} 76 40 14 1.909 0.940
{er <= 0} {pgr <= 32.5} 75 41 16 2.222 0.980
{er <= 0} !{age <= 43} 68 38 14 2.164 0.960
{er <= 0} {pgr <= 7} 64 34 13 1.992 0.930
{er <= 0} {size <= 35} 61 34 15 2.537 0.990
{er <= 0} !{size <= 20} 61 35 12 2.054 0.930
Search Configuration: Single-factor candidates (L) = 34; Maximum combinations evaluated = 595; Search depth (maxk) = 2
Search Results: Candidate subgroups found = 13; Maximum HR estimate = 2.54
Note: E1 = events in treatment arm; Pcons = consistency proportion
Code
res_tabs$tab_estimates
Treatment Effect Estimates
Training data estimates
Subgroup n n1 events m1 m0 RMST HR (95% CI)
ITT 686 (100.0%) 246 (35.9%) 299 (43.6%) 66.3 50.2 7.8 0.69 (0.54, 0.89)
Questionable 82 (12.0%) 26 (31.7%) 45 (54.9%) 22.9 43.7 -14 1.95 (1.04, 3.67)
Recommend 604 (88.0%) 220 (36.4%) 254 (42.1%) 66.7 52.6 9.3 0.61 (0.47, 0.80)

1.5 Bootstrap Inference

Code
#output_dir <- "dev/vignettes-working/applications/gbsg/results"
output_dir <- "results/"
save_results <- dir.exists(output_dir)

# patchhwork needed for a combined bootstrap plot (otherwise if not avaialable will not produce)
library(patchwork)

# Number of bootstrap samples
NB <- 1000

system.time({fs_bc <- forestsearch_bootstrap_dofuture(
  fs.est = fs, 
  nb_boots = NB, 
  show_three = FALSE, 
  details = TRUE)
})

plan("sequential")


if (save_results) {
    filename <- file.path(output_dir, 
                         paste0("gbsg_k2_B=", 
                                format(NB), 
                                ".RData"))
    save(fs_bc, fs, file = filename)
    cat("\nResults saved to:", filename, "\n")
}

1.5.1 Diagnostics and Summaries

Code
#load("~/Documents/GitHub/forestsearch/vignettes/results/sim_gbsg_example_B=1000.RData")

output_dir <- "results/"


load_results <- dir.exists(output_dir)
NB <- 1000
if(load_results){
filename <- file.path(output_dir, 
                         paste0("gbsg_k2_B=", 
                                format(NB),".RData"))

load(file = filename)
}

summaries <- summarize_bootstrap_results(
      sgharm = fs$sg.harm,
      boot_results = fs_bc,
      create_plots = TRUE,
      est.scale = "hr"
    )

===============================================================
           BOOTSTRAP ANALYSIS SUMMARY                          
===============================================================

BOOTSTRAP SUCCESS METRICS:
-------------------------------------------------------------
  Total iterations:              1000
  Successful subgroup ID:        867 (86.7%)
  Failed to find subgroup:       133 (13.3%)

TIMING ANALYSIS:
-------------------------------------------------------------
Overall:
  Total bootstrap time:          58.82 minutes (0.98 hours)
  Average per iteration:         0.06 min (3.5 sec)

Per-iteration timing:
  Mean:                          0.56 min (33.4 sec)
  Median:                        0.66 min (39.7 sec)
  Std Dev:                       0.26 minutes
  Range:                         [0.01, 0.98] minutes
  IQR:                           [0.34, 0.78] minutes

ForestSearch timing (successful iterations only):
  Iterations with FS:            1000 (100.0%)
  Mean FS time:                  0.56 min (33.4 sec)
  Median FS time:                0.66 min (39.7 sec)
  Total FS time:                 556.56 minutes
  FS time % of total:            946.2%

Overhead timing (Cox models, bias correction, etc.):
  Mean overhead:                 0.00 min (0.0 sec)
  Median overhead:               0.00 min (0.0 sec)
  Total overhead:                0.14 minutes
  Overhead % of total:           0.2%

PERFORMANCE ASSESSMENT:
-------------------------------------------------------------
  Performance rating:            ✓✓✓ Excellent
  Average iteration speed:       3.5 seconds

===============================================================
Code
sg_tab <- summaries$table

sg_tab
Treatment Effect by Subgroup
Bootstrap bias-corrected estimates (1000 iterations)
Subgroup
Sample Size
Survival
Treatment Effect
N NT Events MedT MedC RMSTd HR
(95% CI)1
HR
(95% CI)2
Qstnbl 82 (12.0%) 26 (31.7%) 45 (54.9%) 22.9 43.7 -14 1.95 (1.04, 3.67) 1.58 (0.86,2.91)
Recmnd 604 (88.0%) 220 (36.4%) 254 (42.1%) 66.7 52.6 9.3 0.61 (0.47, 0.80) 0.65 (0.46,0.93)
1 Unadjusted HR: Standard Cox regression hazard ratio with robust standard errors
2 Bias-corrected HR: Bootstrap-adjusted estimate using infinitesimal jacknife method (1000 iterations). Corrects for optimism in subgroup selection.
Note: Med = Median survival time (months). RMSTd = Restricted mean survival time difference. Subgroup identified in 86.7% of bootstrap samples.
Code
event_summary <- summarize_bootstrap_events(fs_bc, threshold = 12)

=== Bootstrap Event Count Summary ===
Total bootstrap iterations: 1000
Event threshold: <12 events

ORIGINAL Subgroup H on BOOTSTRAP samples:
  Control arm <12 events: 0 (0.0%)
  Treatment arm <12 events: 0 (0.0%)
  Either arm <12 events: 0 (0.0%)

ORIGINAL Subgroup Hc on BOOTSTRAP samples:
  Control arm <12 events: 0 (0.0%)
  Treatment arm <12 events: 0 (0.0%)
  Either arm <12 events: 0 (0.0%)

NEW Subgroups found: 867 (86.7%)

NEW Subgroup H* on ORIGINAL data:
  Control arm <12 events: 0 (0.0% of successful)
  Treatment arm <12 events: 0 (0.0% of successful)
  Either arm <12 events: 0 (0.0% of successful)

NEW Subgroup Hc* on ORIGINAL data:
  Control arm <12 events: 0 (0.0% of successful)
  Treatment arm <12 events: 2 (0.2% of successful)
  Either arm <12 events: 2 (0.2% of successful)
Code
summaries$diagnostics_table_gt
Bootstrap Diagnostics Summary
Analysis of 1000 bootstrap iterations
Category Metric Value
Success Rate1 Total iterations 1000
Successful subgroup ID 867 (86.7%)
Failed to find subgroup 133 (13.3%)
Success rating Good ✓✓
Subgroup H (Questionable) Unadjusted estimate 1.95 (1.04, 3.67)
Bias-corrected estimate 1.58 (0.86, 2.91)
Bias correction impact2 19.0%
CI width change3 2.64 -> 2.06
Subgroup Hc (Recommend) Unadjusted estimate 0.61 (0.47, 0.80)
Bias-corrected estimate 0.65 (0.46, 0.93)
Bias correction impact2 6.1%
CI width change3 0.33 -> 0.47
Bootstrap Quality: H Valid iterations 867
Mean (SD) 0.46 (0.40)
Coefficient of variation4 86.4%
Skewness5 -0.00
Bootstrap Quality: Hc Valid iterations 867
Mean (SD) -0.43 (0.22)
Coefficient of variation4 50.4%
Skewness5 0.13
Search Performance Mean max HR found 3.13 (1.25)
Mean factors evaluated 42.2
Mean combinations tried 944
Proportion at maxk --
1 Success Rate: Proportion of bootstrap samples where ForestSearch identified a valid subgroup
2 Bias Correction Impact: Percentage change from unadjusted to bias-corrected estimate
3 CI Width Change: Confidence interval width before -> after bias correction
4 Coefficient of Variation: Standard deviation as % of mean (lower is better)
5 Skewness: Measure of asymmetry (0 = symmetric, |skew| < 1 is generally good)
Interpretation Guide:

Good stability: Subgroup is reliably identified in most bootstrap samples.

High variability: Bootstrap estimates are imprecise (CV >= 25%). Consider increasing nb_boots or sample size.

Code
summaries$subgroup_summary$original_agreement
                            Metric       Value
                            <char>      <char>
1:      Total bootstrap iterations        1000
2:           Successful iterations         867
3: Failed iterations (no subgroup)         133
4:       Exact match with original   62 (7.2%)
5:         Different from original 805 (92.8%)
Code
summaries$subgroup_summary$factor_presence
  Rank Factor Count   Percent
6    1    pgr   482 55.594002
1    2    age   347 40.023068
7    3   size   342 39.446367
2    4     er   257 29.642445
5    5  nodes    89 10.265283
3    6 grade3    67  7.727797
4    7   meno    41  4.728950
Code
summaries$subgroup_summary$factor_presence_specific
   Rank Base_Factor Factor_Definition Count  Percent
83    1        size     !{size <= 25}   110 12.68743
Code
summaries$plots$combined

1.6 Forestsearch with depth=1 (maxk = 1)

Code
# Setup parallel processing
library(doFuture)
library(doRNG)

registerDoFuture()
registerDoRNG()

system.time({fs <- forestsearch(df.analysis,  confounders.name = confounders.name,
                                outcome.name = "time_months", treat.name = "hormon", event.name = "status", id.name = "id",
                                potentialOutcome.name = NULL, 
                                df.test = NULL,
                                flag_harm.name = NULL,
                                hr.threshold = 1.25, hr.consistency = 1.0, pconsistency.threshold = 0.90,
                                sg_focus = "maxSG", max_subgroups_search = 30,
                                showten_subgroups = TRUE, details=TRUE,
                                conf_force = c("age <= 65"),
                                cut_type = "default", use_grf = TRUE, plot.grf = TRUE, use_lasso = TRUE,
                                maxk = 1, 
                                n.min = 60, d0.min = 12, d1.min = 12,
                                plot.sg = TRUE, by.risk = 6,
                                parallel_args = list(plan="callr", workers = 30, show_message = TRUE)
)
})
GRF stage for cut selection with dmin,tau= 12 0.6 
tau, maxdepth = 46.75811 2 
   leaf.node control.mean control.size control.se depth
1          2         6.49        82.00       3.34     1
2          3        -4.10       604.00       1.06     1
11         4        -7.90       112.00       2.81     2
21         5         3.86       177.00       1.87     2
4          7        -5.89       356.00       1.33     2

Selected subgroup:
  leaf.node control.mean control.size control.se depth
1         2         6.49        82.00       3.34     1

GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "er <= 0"

All splits:
[1] "er <= 0"   "age <= 50" "age <= 43"
# of continuous/categorical characteristics 5 2 
Continuous characteristics: age size nodes pgr er 
Categorical characteristics: meno grade3 
## Prior to lasso: age size nodes pgr er 
#### Lasso selection results 
7 x 1 sparse Matrix of class "dgCMatrix"
                 s0
age     .          
meno    .          
size    0.005433435
grade3  0.178139021
nodes   0.049670523
pgr    -0.001812895
er      .          
Cox-LASSO selected: size grade3 nodes pgr 
Cox-LASSO not selected: age meno er 
### End Lasso selection 
## After lasso: size nodes pgr 
Default cuts included from Lasso: size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size) nodes <= mean(nodes) nodes <= median(nodes) nodes <= qlow(nodes) nodes <= qhigh(nodes) pgr <= mean(pgr) pgr <= median(pgr) pgr <= qlow(pgr) pgr <= qhigh(pgr) 
Categorical after Lasso: grade3 
Factors per GRF: er <= 0 age <= 50 age <= 43 
Initial GRF cuts included er <= 0 age <= 50 age <= 43 
Factors included per GRF (not in lasso) er <= 0 age <= 50 age <= 43 

===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 17 cut expressions once and caching...
Cut evaluation summary:
  Total cuts:  17 
  Valid cuts:  17 
  Errors:  0 
✓ All 17 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====

# of candidate subgroup factors= 17 
 [1] "er <= 0"      "age <= 50"    "age <= 43"    "age <= 65"    "size <= 29.3"
 [6] "size <= 25"   "size <= 20"   "size <= 35"   "nodes <= 5"   "nodes <= 3"  
[11] "nodes <= 1"   "nodes <= 7"   "pgr <= 110"   "pgr <= 32.5"  "pgr <= 7"    
[16] "pgr <= 131.8" "grade3"      
Number of factors evaluated= 17 
Confounders per grf screening q1 q2 q3 q17 q5 q6 q15 q11 q10 q13 q8 q9 q7 q14 q12 q16 q4 
        Factors Labels VI(grf)
1       er <= 0     q1  0.1345
2     age <= 50     q2  0.1290
3     age <= 43     q3  0.1213
17       grade3    q17  0.0663
5  size <= 29.3     q5  0.0631
6    size <= 25     q6  0.0559
15     pgr <= 7    q15  0.0521
11   nodes <= 1    q11  0.0497
10   nodes <= 3    q10  0.0455
13   pgr <= 110    q13  0.0451
8    size <= 35     q8  0.0436
9    nodes <= 5     q9  0.0414
7    size <= 20     q7  0.0406
14  pgr <= 32.5    q14  0.0406
12   nodes <= 7    q12  0.0383
16 pgr <= 131.8    q16  0.0223
4     age <= 65     q4  0.0108
Number of possible configurations (<= maxk): maxk = 1 , # combinations = 34 
Events criteria: control >= 12 , treatment >= 12 
Subgroup search completed in 0 minutes
Found 1 subgroup candidate(s)
# of candidate subgroups (meeting all criteria) = 1 
# of unique initial candidates: 1 
# Restricting to top stop_Kgroups = 30 
# of candidates restricted to 'top K': 1 
Using parallel processing: callr with 30 workers
Consistency met!
# of splits = 1000 
**** Subgroup, % Consistency Met= {er <= 0} 0.95 

*** Subgroup found: {er <= 0} 
% consistency criteria met= 0.95 
SG focus= maxSG 
Subgroup Consistency Minutes= 0.03283333 
Subgroup found (FS) with sg_focus='maxSG'
Selected subgroup: {er <= 0} 
Minutes forestsearch overall = 0.04 
   user  system elapsed 
  4.150   0.145   2.451 
Code
plan("sequential")


# Results for estimation (training) data, which_df = "est" is default
res_tabs <- sg_tables(fs, ndecimals = 3, which_df = "est")

res_tabs$sg10_out
Identified Subgroups
Single-factor subgroups (maxk=1)
Factor N Events E1 HR Pcons
{er <= 0} 82 45 16 1.951 0.950
Search Configuration: Single-factor candidates (L) = 34; Maximum combinations evaluated = 34; Search depth (maxk) = 1
Search Results: Candidate subgroups found = 1; Maximum HR estimate = 1.95
Note: E1 = events in treatment arm; Pcons = consistency proportion
Code
res_tabs$tab_estimates
Treatment Effect Estimates
Training data estimates
Subgroup n n1 events m1 m0 RMST HR (95% CI)
ITT 686 (100.0%) 246 (35.9%) 299 (43.6%) 66.3 50.2 7.8 0.69 (0.54, 0.89)
Questionable 82 (12.0%) 26 (31.7%) 45 (54.9%) 22.9 43.7 -14 1.95 (1.04, 3.67)
Recommend 604 (88.0%) 220 (36.4%) 254 (42.1%) 66.7 52.6 9.3 0.61 (0.47, 0.80)

1.7 Bootstrap Inference

Code
output_dir <- "results/"
save_results <- dir.exists(output_dir)

# patchhwork needed for a combined bootstrap plot (otherwise if not avaialable will not produce)
library(patchwork)

# Number of bootstrap samples
NB <- 1000

system.time({fs_bc <- forestsearch_bootstrap_dofuture(
  fs.est = fs, 
  nb_boots = NB, 
  show_three = FALSE, 
  details = TRUE)
})

plan("sequential")


if (save_results) {
    filename <- file.path(output_dir, 
                         paste0("gbsg_k1_B=", 
                                format(NB), 
                                ".RData"))
    save(fs_bc, fs, file = filename)
    cat("\nResults saved to:", filename, "\n")
}

1.7.1 Diagnostics and Summaries

Code
output_dir <- "results/"
load_results <- dir.exists(output_dir)
NB <- 1000
if(load_results){
filename <- file.path(output_dir, 
                         paste0("gbsg_k1_B=", 
                                format(NB),".RData"))

load(file = filename)
}

summaries <- summarize_bootstrap_results(
      sgharm = fs$sg.harm,
      boot_results = fs_bc,
      create_plots = TRUE,
      est.scale = "hr"
    )

===============================================================
           BOOTSTRAP ANALYSIS SUMMARY                          
===============================================================

BOOTSTRAP SUCCESS METRICS:
-------------------------------------------------------------
  Total iterations:              1000
  Successful subgroup ID:        477 (47.7%)
  Failed to find subgroup:       523 (52.3%)

TIMING ANALYSIS:
-------------------------------------------------------------
Overall:
  Total bootstrap time:          6.03 minutes (0.10 hours)
  Average per iteration:         0.01 min (0.4 sec)

Per-iteration timing:
  Mean:                          0.08 min (4.7 sec)
  Median:                        0.07 min (4.0 sec)
  Std Dev:                       0.07 minutes
  Range:                         [0.01, 0.58] minutes
  IQR:                           [0.02, 0.11] minutes

ForestSearch timing (successful iterations only):
  Iterations with FS:            1000 (100.0%)
  Mean FS time:                  0.08 min (4.7 sec)
  Median FS time:                0.07 min (4.0 sec)
  Total FS time:                 77.85 minutes
  FS time % of total:            1290.5%

Overhead timing (Cox models, bias correction, etc.):
  Mean overhead:                 0.00 min (0.0 sec)
  Median overhead:               0.00 min (0.0 sec)
  Total overhead:                0.19 minutes
  Overhead % of total:           3.1%

PERFORMANCE ASSESSMENT:
-------------------------------------------------------------
  Performance rating:            ✓✓✓ Excellent
  Average iteration speed:       0.4 seconds

===============================================================
Code
sg_tab <- summaries$table

sg_tab
Treatment Effect by Subgroup
Bootstrap bias-corrected estimates (1000 iterations)
Subgroup
Sample Size
Survival
Treatment Effect
N NT Events MedT MedC RMSTd HR
(95% CI)1
HR
(95% CI)2
Qstnbl 82 (12.0%) 26 (31.7%) 45 (54.9%) 22.9 43.7 -14 1.95 (1.04, 3.67) 1.34 (0.81,2.22)
Recmnd 604 (88.0%) 220 (36.4%) 254 (42.1%) 66.7 52.6 9.3 0.61 (0.47, 0.80) 0.64 (0.44,0.93)
1 Unadjusted HR: Standard Cox regression hazard ratio with robust standard errors
2 Bias-corrected HR: Bootstrap-adjusted estimate using infinitesimal jacknife method (1000 iterations). Corrects for optimism in subgroup selection.
Note: Med = Median survival time (months). RMSTd = Restricted mean survival time difference. Subgroup identified in 47.7% of bootstrap samples.
Code
event_summary <- summarize_bootstrap_events(fs_bc, threshold = 12)

=== Bootstrap Event Count Summary ===
Total bootstrap iterations: 1000
Event threshold: <12 events

ORIGINAL Subgroup H on BOOTSTRAP samples:
  Control arm <12 events: 0 (0.0%)
  Treatment arm <12 events: 0 (0.0%)
  Either arm <12 events: 0 (0.0%)

ORIGINAL Subgroup Hc on BOOTSTRAP samples:
  Control arm <12 events: 0 (0.0%)
  Treatment arm <12 events: 0 (0.0%)
  Either arm <12 events: 0 (0.0%)

NEW Subgroups found: 477 (47.7%)

NEW Subgroup H* on ORIGINAL data:
  Control arm <12 events: 0 (0.0% of successful)
  Treatment arm <12 events: 0 (0.0% of successful)
  Either arm <12 events: 0 (0.0% of successful)

NEW Subgroup Hc* on ORIGINAL data:
  Control arm <12 events: 0 (0.0% of successful)
  Treatment arm <12 events: 2 (0.4% of successful)
  Either arm <12 events: 2 (0.4% of successful)
Code
summaries$diagnostics_table_gt
Bootstrap Diagnostics Summary
Analysis of 1000 bootstrap iterations
Category Metric Value
Success Rate1 Total iterations 1000
Successful subgroup ID 477 (47.7%)
Failed to find subgroup 523 (52.3%)
Success rating Poor ⚠
Subgroup H (Questionable) Unadjusted estimate 1.95 (1.04, 3.67)
Bias-corrected estimate 1.34 (0.81, 2.22)
Bias correction impact2 31.1%
CI width change3 2.64 -> 1.41
Subgroup Hc (Recommend) Unadjusted estimate 0.61 (0.47, 0.80)
Bias-corrected estimate 0.64 (0.44, 0.93)
Bias correction impact2 3.8%
CI width change3 0.33 -> 0.50
Bootstrap Quality: H Valid iterations 477
Mean (SD) 0.30 (0.39)
Coefficient of variation4 131.5%
Skewness5 0.01
Bootstrap Quality: Hc Valid iterations 477
Mean (SD) -0.45 (0.21)
Coefficient of variation4 47.3%
Skewness5 0.04
Search Performance Mean max HR found 2.39 (0.69)
Mean factors evaluated 43.4
Mean combinations tried 43
Proportion at maxk --
1 Success Rate: Proportion of bootstrap samples where ForestSearch identified a valid subgroup
2 Bias Correction Impact: Percentage change from unadjusted to bias-corrected estimate
3 CI Width Change: Confidence interval width before -> after bias correction
4 Coefficient of Variation: Standard deviation as % of mean (lower is better)
5 Skewness: Measure of asymmetry (0 = symmetric, |skew| < 1 is generally good)
Interpretation Guide:

Poor stability: Subgroup is rarely identified. Consider:

  • Reviewing subgroup criteria (n.min, hr.threshold)
  • Increasing sample size significantly
  • Simplifying search (reduce maxk)
  • Examining if subgroup is real or spurious

High variability: Bootstrap estimates are imprecise (CV >= 25%). Consider increasing nb_boots or sample size.

Code
summaries$subgroup_summary$original_agreement
                            Metric       Value
                            <char>      <char>
1:      Total bootstrap iterations        1000
2:           Successful iterations         477
3: Failed iterations (no subgroup)         523
4:       Exact match with original 309 (64.8%)
5:         Different from original 168 (35.2%)
Code
summaries$subgroup_summary$factor_presence
  Rank Factor Count   Percent
2    1     er   393 82.389937
5    2    pgr    34  7.127883
3    3 grade3    18  3.773585
6    4   size    16  3.354298
1    5    age    11  2.306080
4    6  nodes     5  1.048218
Code
summaries$subgroup_summary$factor_presence_specific
   Rank Base_Factor Factor_Definition Count  Percent
24    1          er         {er <= 0}   309 64.77987
Code
summaries$plots$combined